AITopics | outlier data

Country:

Asia > China (0.04)
Asia > Singapore (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(4 more...)

Neural Information Processing SystemsFeb-15-2026, 17:19:50 GMT

886ed40d7882c9f891824e42a452c228-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (16 more...)

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.93)
Media > Film (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsOct-10-2025, 20:51:25 GMT

Long-Tailed Out-of-Distribution Detection via Normalized Outlier Distribution Adaptation

One key challenge in Out-of-Distribution (OOD) detection is the absence of ground-truth OOD samples during training.

adaptod, detection, outlier distribution, (16 more...)

Country:

Asia > China (0.04)
Asia > Singapore (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(4 more...)

Neural Information Processing SystemsOct-9-2025, 00:39:23 GMT

886ed40d7882c9f891824e42a452c228-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (16 more...)

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.93)
Media > Film (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceJun-11-2025

Re-experiment Smart: a Novel Method to Enhance Data-driven Prediction of Mechanical Properties of Epoxy Polymers

Cui, Wanshan, Jeong, Yejin, Song, Inwook, Kim, Gyuri, Kwon, Minsang, Lee, Donghun

Accurate prediction of polymer material properties through data-driven approaches greatly accelerates novel material development by reducing redundant experiments and trial-and-error processes. To address this limitation, we propose a novel approach to enhance dataset quality efficiently by integrating multi-algorithm outlier detection with selective re-experimentation of unreliable outlier cases. To demonstrate its general applicability, we report the performance improvements across multiple machine learning models, including Elastic Net, SVR, Random Forest, and TPOT, to predict the three key properties. Our method reliably reduces prediction error (RMSE) and significantly improves accuracy with minimal additional experimental work, requiring only about 5% of the dataset to be re-measured. These findings highlight the importance of data quality enhancement in achieving reliable machine learning applications in polymer science and present a scalable strategy for improving predictive reliability in materials science. Introduction Epoxy adhesives are extensively utilized in a wide range of industries, including automotive, aerospace, and civil engineering, due to their robust adhesion to various substrates, exceptional mechanical properties, and high resistance to heat, corrosion, and chemicals 1-4 . Primarily composed of epoxy resin and hardener (curing agent), epoxy adhesives may incorporate additional additives, such as accelerators and fillers, for modification 5 . Epoxy adhesives are formulated by subjecting their compositions to a curing process, which can occur at room temperature, elevated temperature, or through alternative methods such as exposure to UV light 6 .

artificial intelligence, machine learning, outlier, (15 more...)

2506.01994

Country:

Asia > South Korea > Seoul > Seoul (0.05)
North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry:

Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.89)
Materials > Chemicals > Specialty Chemicals (0.77)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Kim, Jang-Hyun, Yun, Sangdoo, Song, Hyun Oh

Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data

arXiv.org Artificial IntelligenceOct-29-2023

Diagnosing and cleaning data is a crucial step for building robust machine learning systems. However, identifying problems within large-scale datasets with real-world distributions is challenging due to the presence of complex issues such as label errors, under-representation, and outliers. In this paper, we propose a unified approach for identifying the problematic data by utilizing a largely ignored source of information: a relational structure of data in the feature-embedded space. To this end, we present scalable and effective algorithms for detecting label errors and outlier data based on the relational graph structure of data. We further introduce a visualization tool that provides contextual information of a data point in the feature-embedded space, serving as an effective tool for interactively diagnosing data. We evaluate the label error and outlier/out-of-distribution (OOD) detection performances of our approach on the large-scale image, speech, and language domain tasks, including ImageNet, ESC-50, and SST2. Our approach achieves state-of-the-art detection performance on all tasks considered and demonstrates its effectiveness in debugging large-scale real-world datasets across various domains. We release codes at https://github.com/snu-mllab/Neural-Relation-Graph.

artificial intelligence, detection performance, machine learning, (16 more...)

2301.12321

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.93)
Media > Film (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceAug-2-2023

Three Factors to Improve Out-of-Distribution Detection

Choi, Hyunjun, Chung, JaeHo, Jeong, Hawook, Choi, Jin Young

In the problem of out-of-distribution (OOD) detection, the usage of auxiliary data as outlier data for fine-tuning has demonstrated encouraging performance. However, previous methods have suffered from a trade-off between classification accuracy (ACC) and OOD detection performance (AUROC, FPR, AUPR). To improve this trade-off, we make three contributions: (i) Incorporating a self-knowledge distillation loss can enhance the accuracy of the network; (ii) Sampling semi-hard outlier data for training can improve OOD detection performance with minimal impact on accuracy; (iii) The introduction of our novel supervised contrastive learning can simultaneously improve OOD detection performance and the accuracy of the network. By incorporating all three factors, our approach enhances both accuracy and OOD detection performance by addressing the trade-off between classification and OOD detection. Our method achieves improvements over previous approaches in both performance metrics.

artificial intelligence, data mining, machine learning, (13 more...)

2308.0103

Country:

Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Kaakai, Fateh, Adibhatla, Shridhar "Shreeder", Pai, Ganesh, Escorihuela, Emmanuelle

Data-centric Operational Design Domain Characterization for Machine Learning-based Aeronautical Products

arXiv.org Artificial IntelligenceJul-14-2023

We give a first rigorous characterization of Operational Design Domains (ODDs) for Machine Learning (ML)-based aeronautical products. Unlike in other application sectors (such as self-driving road vehicles) where ODD development is scenario-based, our approach is data-centric: we propose the dimensions along which the parameters that define an ODD can be explicitly captured, together with a categorization of the data that ML-based applications can encounter in operation, whilst identifying their system-level relevance and impact. Specifically, we discuss how those data categories are useful to determine: the requirements necessary to drive the design of ML Models (MLMs); the potential effects on MLMs and higher levels of the system hierarchy; the learning assurance processes that may be needed, and system architectural considerations. We illustrate the underlying concepts with an example of an aircraft flight envelope.

artificial intelligence, machine learning, mlcodd, (15 more...)

2307.07681

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > Ohio > Hamilton County > Cincinnati (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Air (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Aerospace & Defense > Aircraft (0.69)
Automobiles & Trucks (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)

arXiv.org Artificial IntelligenceOct-19-2022

Mixture Outlier Exposure: Towards Out-of-Distribution Detection in Fine-grained Environments

Zhang, Jingyang, Inkawhich, Nathan, Linderman, Randolph, Chen, Yiran, Li, Hai

Many real-world scenarios in which DNN-based recognition systems are deployed have inherently fine-grained attributes (e.g., bird-species recognition, medical image classification). In addition to achieving reliable accuracy, a critical subtask for these models is to detect Out-of-distribution (OOD) inputs. Given the nature of the deployment environment, one may expect such OOD inputs to also be fine-grained w.r.t. the known classes (e.g., a novel bird species), which are thus extremely difficult to identify. Unfortunately, OOD detection in fine-grained scenarios remains largely underexplored. In this work, we aim to fill this gap by first carefully constructing four large-scale fine-grained test environments, in which existing methods are shown to have difficulties. Particularly, we find that even explicitly incorporating a diverse set of auxiliary outlier data during training does not provide sufficient coverage over the broad region where fine-grained OOD samples locate. We then propose Mixture Outlier Exposure (MixOE), which mixes ID data and training outliers to expand the coverage of different OOD granularities, and trains the model such that the prediction confidence linearly decays as the input transitions from ID to OOD. Extensive experiments and analyses demonstrate the effectiveness of MixOE for building up OOD detector in fine-grained environments. The code is available at https://github.com/zjysteven/MixOE.

data mining, detection, machine learning, (19 more...)

2106.03917

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.86)
(2 more...)

#artificialintelligenceOct-6-2022, 15:05:32 GMT

How Should We Detect and Treat the Outliers?

How do we need to detect outliers? How do we need to treat the outliers? An outlier is that datapoint or observation which behaves very differently from the rest of the data. If we are finding the average net worth of a group of people, and if we find Elon Musk in that group, then the complete analysis will go wrong because of just one outlier. This is a reason why outliers should be treated properly before building a machine learning model.

data and boxplot, outlier, standard deviation, (12 more...)

#artificialintelligence

Industry: Law Enforcement & Public Safety > Fraud (0.31)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)